Soap Berry - LTEM - EcoDomain Comparison

Published

2023-02-26

1 Summary of Soap Berry LTEM protocol

1.1 Basic protocol

Areas rich in soapberries is located for permanent monitoring. In each area, 10 robust bushes are chosen, and two stems on each plant are chosen for sampling. The stems and bushes are marked with permanent tags so that they can be revisited each year.

There are three measures taken in this protocol.

  • Berry count. The number of berries produced on the stem is recorded as an index of soapberry production.
  • Stem diameter. The diameter (in millimeters) of the stem near its base is measured.
  • Mean berry weight. A collection of 25-50 ripe red berries is obtained in August and weighed so the average wet weight of a single berry from each area is obtained.

If the tagged stem has died (or is damaged or the tag on the stem has “disappeared”), a new stem is chosen. This may be from a new bush or the same bush. If the stem has been browsed, then no count is conducted on this stem this year.

If the tagged bush has died (or the tags on all of the stems have “disappeared”), a new bush is selected for subsequent monitoring.

1.2 Database structure

The relevant fields extracted from the database are:

The relevant fields from the database are:

  • STUDY_AREA_NAME. The name of the study area.
  • SAMPLE_STATION_LABEL. The bush/stem label.
  • Date. The date the data was collected. The Year is extracted from this date.
  • Berry_Count. The number of soap berries on this stem. If the stem is browed (or damaged) a missing value should be entered here and not the value of 0.
  • Stem_Diameter. The diameter (mm) of the stem.
  • Average_Weight. The average weight of a sample of berries is collected. Notice that there is only ONE mean weight found per year and so this value
    is replicated on every stem line of the sheet. The sample size used to determine the weight is in a separate column. It is assumed that 0’s have been already imputed on the database.

The longitude and latitude are used to find the EcoDomain of each study area.

2 Reading and checking the data

2.1 Extract all LTEM data

The database was queried for all observations from this protocol in the Province.

The following surveys and years of surveys were found.

Number of records in each study area x year combination 
                      Year
STUDY_AREA_NAME_shrt   2013 2014 2015 2016 2017 2018 2019 2020 2021
  Ancient Forest/Chun     0    0    0    0   20   20    0    0    0
  Beatton Park            0   20   20   20    0    0    0    0    0
  Bowser Ecological Re   20   20   20   20   20   20   20   20   20
  Crooked River Park     20   20   20   40    0    0    0   23    0
  E.C. Manning Park      20   20   20    0    0   20   20   20    0
  Elk Falls Provincial    0    0    0    0    0   20   20   20   20
  Eskers Park             0    0   20   20   20    0    0    0    0
  Fiordland Conservanc    0    0    0    0    0    0    0    0   20
  Golden Ears Park        0    0   20   20   19   20   20    0    0
  Goldstream Park         0    0    0    0   20   20    0    0   24
  John Dean Park          0    0   20   20   20   20    0    0   20
  Mount Robson Park       0    0   20   20    0   20   20   30   22
  Mount Seymour Park      0    0   20   20    0   22    0    0    0
  Muncho Lake Park        0    0   20   19    0   19    0   19    0
  Okanagan Mountain Pa    0   20    0   20    0   20    0   20    0
  Purcell Wilderness C    0    0    0   20   20   20   20   20   20
  Schoen Lake Park        0    0   20   20   20   20   18   20    0
  Skagit Valley Park      0    0    0   20    0   20   20    0   20
  Strathcona Park         0   20    0   20   20   20   20   20   20
  Tarahne Park            0    0    0    0    0    0    0   20    0
  Tatshenshini Alsek P    0    0   20    0    0    0   20    0    0
  Yaaguun Gandlaay Her    0   20   20   20   20   25   20   20   24

It is not necessary that every STUDY_AREA be measured in every year and the number of measurements at each STUDY_AREA can vary within years and across years.

If STUDY_AREA is only measured in one year, it does not provide any information on the trends and can be dropped. Similary, STUDY_AREAS with only 2 years of data, provide no information on the uncertainty; they can be retained or dropped.

STUDY_AREAs dropped for insuffient data are:

Study areas dropped because of insufficient years of data

Study Area

# years of data

Fiordland Conservancy

1

Tarahne Park

1

2.2 Ecodomain of each study area

The Ecodomain of each study area was obtained by looking up the mean latitude/longitude of observations for each STUDY_AREA in the Ecoprovices provided in the bcmaps package. In some cases, latitude/longitude is not provided as noted below:

Site missing long/lat data 
   STUDY_AREA_NAME_shrt LONGITUDE_DD LATITUDE_DD
1  Ancient Forest/Chun           NaN         NaN
3  Bowser Ecological Re          NaN         NaN
5     E.C. Manning Park          NaN         NaN
8      Golden Ears Park          NaN         NaN
13     Muncho Lake Park          NaN         NaN
14 Okanagan Mountain Pa          NaN         NaN
15 Purcell Wilderness C          NaN         NaN
17   Skagit Valley Park          NaN         NaN
19 Tatshenshini Alsek P          NaN         NaN

How is it possible that long/lat is not know for some study area?

The classification of each remaining STUDY_AREA into its EcoDomain is:

                      ECODOMAIN
STUDY_AREA_NAME_shrt   CoastMtn HumConHigh North
  Beatton Park                0          0     1
  Crooked River Park          0          1     0
  Elk Falls Provincial        1          0     0
  Eskers Park                 0          1     0
  Goldstream Park             1          0     0
  John Dean Park              1          0     0
  Mount Robson Park           0          1     0
  Mount Seymour Park          1          0     0
  Schoen Lake Park            1          0     0
  Strathcona Park             1          0     0
  Yaaguun Gandlaay Her        1          0     0
# check if at least 2 eco domains at at least 4 study areas and at least 3 years x 2 ecodomains x 4 study area
Records in remaining sites where ECODOMAIN could be determined
                      ECODOMAIN
STUDY_AREA_NAME_shrt   CoastMtn HumConHigh North
  Beatton Park                0          0    60
  Crooked River Park          0        123     0
  Elk Falls Provincial       80          0     0
  Eskers Park                 0         60     0
  Goldstream Park            64          0     0
  John Dean Park            100          0     0
  Mount Robson Park           0        132     0
  Mount Seymour Park         62          0     0
  Schoen Lake Park          118          0     0
  Strathcona Park           140          0     0
  Yaaguun Gandlaay Her      169          0     0

2.3 Checking species code

The species code should be the same across for all data values.

            Year
SPECIES_CODE 2013 2014 2015 2016 2017 2018 2019 2020 2021
     NULL       0   12    2    5    2    0    0    0    0
     SHEPCAN   20   40   98   98   20   20   20   53   22
     VACCOVL    0    0    0   20   20   20   18   20    0
     VACCPAR    0   28   60   77   78  127   60   60  108
*** WARNING *** More than one species name found - OK if some are NULL 

Is only SHEPCAN to be analyzed in this document??

3 Multi-Site Analysis

This design has multiple stems on bushes that are repeatedly measured over time. Please refer to the Fitting Trends with Complex Study Designs document in the CommonFile directory for information on fitting trends with complex study designs.

All analyses were done using the R (R Core Team, 2022) analysis system. All plots are also saved as separate *png files for inclusion into other reports.

3.1 Mean berry weight.

This measurement is taken at the STUDY_AREA_NAME level and so there is one measurement available per STUDY_AREA_NAME.year. Notice that this value is replicated multiple times in the database for each individual stem. These are NOT real replicated readings but only an artifact of the database so some care is needed to extract only a single value per STUDY_AREA_NAME.year.

3.1.1 Summarize to STUDY_AREA_NAME.year level

The data is first summarized to the STUDY_AREA_NAME.year level by extracting a single value for the mean berry weight from the repeated values in the database.

The summary data is:

                  STUDY_AREA_NAME STUDY_AREA_NAME_shrt  ECODOMAIN Year Mean.Berry.Weight..gm.
1                    Beatton Park         Beatton Park      North 2014              0.1380000
2                    Beatton Park         Beatton Park      North 2015                    NaN
3                    Beatton Park         Beatton Park      North 2016                    NaN
4              Crooked River Park   Crooked River Park HumConHigh 2013              0.1687000
5              Crooked River Park   Crooked River Park HumConHigh 2014              0.0870000
6              Crooked River Park   Crooked River Park HumConHigh 2015                    NaN
7              Crooked River Park   Crooked River Park HumConHigh 2016              0.1253846
8              Crooked River Park   Crooked River Park HumConHigh 2020              1.4782609
9       Elk Falls Provincial Park Elk Falls Provincial   CoastMtn 2018              0.2290000
10      Elk Falls Provincial Park Elk Falls Provincial   CoastMtn 2019              0.3300000
11      Elk Falls Provincial Park Elk Falls Provincial   CoastMtn 2020              0.3750000
12      Elk Falls Provincial Park Elk Falls Provincial   CoastMtn 2021             12.6426316
13                    Eskers Park          Eskers Park HumConHigh 2015              0.1600000
14                    Eskers Park          Eskers Park HumConHigh 2016              0.1500000
15                    Eskers Park          Eskers Park HumConHigh 2017              0.1200000
16                Goldstream Park      Goldstream Park   CoastMtn 2017              0.1500000
17                Goldstream Park      Goldstream Park   CoastMtn 2018              0.0800000
18                Goldstream Park      Goldstream Park   CoastMtn 2021              1.6250000
19                 John Dean Park       John Dean Park   CoastMtn 2015                    NaN
20                 John Dean Park       John Dean Park   CoastMtn 2016              0.0500000
21                 John Dean Park       John Dean Park   CoastMtn 2017              0.0500000
22                 John Dean Park       John Dean Park   CoastMtn 2018              0.0270000
23                 John Dean Park       John Dean Park   CoastMtn 2021              1.6500000
24              Mount Robson Park    Mount Robson Park HumConHigh 2015                    NaN
25              Mount Robson Park    Mount Robson Park HumConHigh 2016                    NaN
26              Mount Robson Park    Mount Robson Park HumConHigh 2018                    NaN
27              Mount Robson Park    Mount Robson Park HumConHigh 2019              0.2700000
28              Mount Robson Park    Mount Robson Park HumConHigh 2020              2.0666667
29              Mount Robson Park    Mount Robson Park HumConHigh 2021              1.0909091
30             Mount Seymour Park   Mount Seymour Park   CoastMtn 2015              0.0480000
31             Mount Seymour Park   Mount Seymour Park   CoastMtn 2016              0.3400000
32             Mount Seymour Park   Mount Seymour Park   CoastMtn 2018              1.6363636
33               Schoen Lake Park     Schoen Lake Park   CoastMtn 2015                    NaN
34               Schoen Lake Park     Schoen Lake Park   CoastMtn 2016              1.5500000
35               Schoen Lake Park     Schoen Lake Park   CoastMtn 2017                    NaN
36               Schoen Lake Park     Schoen Lake Park   CoastMtn 2018              1.3500000
37               Schoen Lake Park     Schoen Lake Park   CoastMtn 2019              2.0555556
38               Schoen Lake Park     Schoen Lake Park   CoastMtn 2020              1.7500000
39                Strathcona Park      Strathcona Park   CoastMtn 2014                    NaN
40                Strathcona Park      Strathcona Park   CoastMtn 2016              0.3500000
41                Strathcona Park      Strathcona Park   CoastMtn 2017              0.0496000
42                Strathcona Park      Strathcona Park   CoastMtn 2018              1.0000000
43                Strathcona Park      Strathcona Park   CoastMtn 2019              2.5000000
44                Strathcona Park      Strathcona Park   CoastMtn 2020              1.5000000
45                Strathcona Park      Strathcona Park   CoastMtn 2021              1.7500000
46 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her   CoastMtn 2014              0.3000000
47 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her   CoastMtn 2015              0.1400000
48 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her   CoastMtn 2016              0.1900000
49 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her   CoastMtn 2017              0.2600000
50 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her   CoastMtn 2018              1.9600000
51 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her   CoastMtn 2019              1.3500000
52 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her   CoastMtn 2020              1.2500000
53 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her   CoastMtn 2021              1.5789474

Why is mean berry weight missing for so many site years?

3.1.2 Preliminary plot

A summary plot of the mean berry weight in each year for each STUDY_AREA and an the trends over time is shown in Figure 1.

Figure 1: Summary plot of the data. The data is analyzed on the logarithmic scale.

Strange value in Coast Mountains ECODOMAIN that needs investigation

We notice that the North EcoDomain only has a single site with a single year and so cannot be used in the analysis. In general, each EcoDomain needs at least one site with at least 3 years of data.

Following ECODOMAINS removed because not enought sites with enough years
  ECODOMAIN n.sites.at.least.3.years
3     North                        0

3.1.3 Model

A non-parallel slope model is fit allowing for a different average slope (over the multiple STUDY_AREAs) in each EcoDomains (non-parallel slopes). Within each EcoDomain, each STUDY_AREA’s slope is allowed to vary randomly around the average slope for the EcoDomain. Within each STUDY_AREA, each transect is allowed to have a different intercept but common slope. Finally, we allow for year-specific factor within each EcoDomain.

The analysis is done on the logarithmic scale so that trends over have a simple interpretation.

The model, in a short-hand notation is:

\[log(MeanBerryWeight) \sim \mathit{EcoDomain} + \mathit{Year} + \mathit{EcoDomain}:\mathit{Year}+ \mathit{StudyArea} + \mathit{StudyArea}:\mathit{Year(R)} + \mathit{YearF:EcoDomain(R)}\]

where

  • \(log(MeanBerryWeight)\) is logarithm of the mean berry weight in that year for a STUDY_AREA.
  • \(\mathit{Year}\) term represents the average (over all EcoDomains) calendar year trend over time.
  • \(\mathit{EcoDomain}\) term represents a different intercept for each EcoDomain
  • \(\mathit{EcoDomain}:\mathit{Year}\) term represents the differential average slope for each EcoDomain
  • \(\mathit{StudyArea}:\mathit{Year(R)}\) term represents the random slopes within each EcoDomain for each study area
  • \(\mathit{YearF:EcoDomain(R)}\) represents the (random) year-specific effects (process error), thare are allowed to vary across EcoDomains.

The \(\mathit{YearF:EcoDomain}\) term represent the year-specific effects (process error) caused by environmental factors (e.g., a warmer than normal year may yield larger berries, on average).

Model fit on the logarithmic scale assume that effects are multiplicative over time, so that the when the actual fit is done on the logarithmic scale, the trends are linear. For example, a trend may assume that there is constant 5% change over time rather than a fixed 1-unit change per year.

The model was fit using the lmerTest() function (Kuznetsova, 2017) in R (R Core Team, 2023).

boundary (singular) fit: see help('isSingular')
Type III Analysis of Variance Table with Satterthwaite's method
                 Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
Year            21.4024 21.4024     1 23.874 25.9609 3.315e-05 ***
ECODOMAINF       0.7112  0.7112     1 23.877  0.8627    0.3623    
Year:ECODOMAINF  0.7114  0.7114     1 23.875  0.8629    0.3622    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: log(Mean.Berry.Weight..gm.) ~ Year + ECODOMAINF + Year:ECODOMAINF +      (1 | STUDY_AREA_NAMEF) + (Year - 1 | STUDY_AREA_NAME) + (1 |      YearF:ECODOMAINF)
   Data: berry.weight

REML criterion at convergence: 125.4

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.96500 -0.58151 -0.02038  0.54093  2.07313 

Random effects:
 Groups           Name        Variance Std.Dev.
 YearF:ECODOMAINF (Intercept) 0.04882  0.2210  
 STUDY_AREA_NAME  Year        0.00000  0.0000  
 STUDY_AREA_NAMEF (Intercept) 0.43231  0.6575  
 Residual                     0.82441  0.9080  
Number of obs: 42, groups:  YearF:ECODOMAINF, 16; STUDY_AREA_NAME, 10; STUDY_AREA_NAMEF, 10

Fixed effects:
                            Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)               -1018.0709   193.1253     9.0659  -5.272 0.000501 ***
Year                          0.5041     0.0957     9.0619   5.268 0.000504 ***
ECODOMAINFHumConHigh        313.6068   337.6401    23.8771   0.929 0.362278    
Year:ECODOMAINFHumConHigh    -0.1555     0.1674    23.8747  -0.929 0.362229    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) Year   ECODOM
Year        -1.000              
ECODOMAINFH -0.572  0.572       
Y:ECODOMAIN  0.572 -0.572 -1.000
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

3.1.5 Estimated slopes

The estimated coefficients from R output above do not have a direct interpretation because of the way that R codes the design matrix for discrete variables such as the EcoDomain.These coefficients associated with each EcoDomain are the difference in the slopes between the slope for that particular EcoDomain and the reference EcoDomain (typically the EcoDomain that is “first” alphabetically).

The estimated slope for each EcoDomain are estimated using the emmeans package (Lenth, 2023) presented in Table 2.

Table 2: Estimated slope in each EcoDomain

Eco Domain

Slope

SE

df

95% LCL

95% UCL

Grouping

HumConHigh

0.3486

0.1373

34.2

0.0697

0.6276

1

CoastMtn

0.5041

0.0957

9.1

0.2878

0.7203

1

Degrees-of-freedom method: satterthwaite

Confidence level used: 0.95

Degrees-of-freedom method: satterthwaite

significance level used: alpha = 0.05

NOTE: If two or more means share the same grouping symbol,
then we cannot show them to be different.
But we also did not show them to be the same.

The estimated trends (on the logarithmic scale) can be interpretted as the proportional change per year. For example, the slope of 0.504 in the CoastMtn EcoDomain can be interpretted as an approximate 50.4% change in the mean berry weight per year.

The Grouping column indicates if there is evidence of a difference among the slopes across EcoDomains. EcoDomains that contain the same “digit” would indicate no evidence of a difference in the mean slopes among the EcoDomains. This is a summary of the overall p-value for parallelism found in the ANOVA table of p = 0.362. For more information on interpretting the .group variable, refer to https://schmidtpaul.github.io/DSFAIR/compactletterdisplay.html.

3.1.6 Mean slope over all EcoDomains

The average slope over all EcoDomains (giving each EcoDomain equal weight regardless of the number of study areas in the EcoDomain) shown in Table 3.

Table 3: Estimated mean slope over all EcoDomains

Mean slope

Slope

SE

df

95% LCL

95% UCL

p.value

overall

0.4264

0.0837

23.9

0.2536

0.5991

p < .001

Results are averaged over the levels of: ECODOMAINF

Degrees-of-freedom method: satterthwaite

Confidence level used: 0.95

The overall average slope over all EcoDomain is 0.426 (SE 0.084 with a p-value of p < .001. This is the same p-value from the ANOVA table for the Year term.

The estimated mean slope is interpreted in the same way as the slopes for each EcoDomain.

3.1.7 Summary plot

Figure 2 shows a summary plot, along with estimates of the slope, its standard error, and the p-value of the hypothesis of no trend in each EcoDomain.

Figure 2: Summary plot of the trend in mean berry weight.

3.1.8 Estimated variance component

The estimated variance components are shown in Table 4.

Table 4: Estimated variance components

Component

Interaction

SD

YearF:ECODOMAIN

0.221

STUDY_AREA_NAME

Year

0.000

STUDY_AREA_NAME

0.658

Residual

0.908

The STUDY_AREA_NAME and STUDY_AREA_NAME:Year represent the variation in the intercepts and slopes among study areas within an EcoDomain. The YearF:ECODOMAIN components represents the variation in the year specific factors (process error) within the EcoDomains. Finally the Residual component represents the left-over, unexplained variation in the data.

Notice in this case, that the variation in the slopes for each STUDY_AREA within the EcoDomains is very close to 0, i.e., the different study areas could all have similar slopes within the EcoDomain. If you examine the individual slopes in each study area (see the individual study area reports), the estimated standard error is quite large indicating the the slopes could be roughly the same across study areas (see Figure 3).

Figure 3: Comparing the individual slopes within each EcoRegion

The year specific effects vary slightly among the EcoDomains as shown in Figure 4.

Figure 4: Plot of year specific effects

We see that the year-specific effects can vary across EcoDomains, e.g., the year-specific impact of weather can be different in the different EcoDomains. This is not surprising because B.C. is a large Province!

3.1.9 Residual plots

Residual plots are presented in Figure 5.

Figure 5: Model fit diagnostic plots from trend in mean transect counts

In the upper left corner is a plot of residuals vs.  the fitted values. A good plot will show a random scatter around 0. Any large deviations from 0 should be investigated as potential outliers. In the upper right is a normal probability plot. Points should be close to the dashed reference line. Fortunately, the analysis is fairly robust against non-normality so only extreme departures are worrisome. Caterpiller plots attempt to show the distribution of the random effects. The bottom left plot shows the distribution of the transect effects. The bottom right plot shows the distribution of the year-specific effects (process variation). In this case, the estimated process variation is very small with most of points very close to 0.

3.2 Stem Diameter.

This measurement is taken at the stem level and so there is one value per stem/bush/year. The same stem is repeatedly measured over time, but stems may leave the protocol (damaged or dead) or be added to the protocol (replacement stem) over time. All of the models below automatically will account for stems that are removed or added as long as each stem has a unique label within a site.

The first few records are:

  STUDY_AREA_NAME_shrt ECODOMAIN Year   BUSH            STEM Branch.Diameter..mm.
1         Beatton Park     North 2015 plant2 plant2-branch34                  9.5
2         Beatton Park     North 2016 plant2 plant2-branch35                  4.2
3         Beatton Park     North 2014 plant2 plant2-branch35                  7.3
4         Beatton Park     North 2016 plant4 plant4-branch39                  4.8
5         Beatton Park     North 2014 plant2 plant2-branch34                  5.5
6         Beatton Park     North 2016 plant5 plant5-branch41                  9.4

Missing values are excluded. If there are some records with a reported diameter of 0 – these likely need to be checked.

Number of records prior to removing NA
[1] 1108  129


Number of records after to removing NA
[1] 1042  129


Record with 0 branch diameter that need attending 
     STUDY_AREA_NAME Year    BUSH             STEM Branch.Diameter..mm.
775 Schoen Lake Park 2015 plant10 plant10-branch19                    0

3.2.1 Preliminary plot

A summary plot of the mean stem diameter in each year for each STUDY_AREA and an the trends over time is shown in Figure 6.

Figure 6: Summary plot of the data. The data is analyzed on the logarithmic scale.

Need to check out the odd value for Coast Mountain stem diameter.

3.2.2 Model

A non-parallel slope model is fit allowing for a different average slope (over the multiple STUDY_AREAs) in each EcoDomains (non-parallel slopes). Within each EcoDomain, each STUDY_AREA’s slope is allowed to vary randomly around the average slope for the EcoDomain. Within each STUDY_AREA, each BUSH and STEM is allowed to have a different intercept but common slope. Finally, we allow for year-specific factor within each EcoDomain.

The model, in a short-hand notation is:

\[log(BranchDiameter) \sim \mathit{EcoDomain} + \mathit{Year} + \mathit{EcoDomain}:\mathit{Year}+ \mathit{StudyArea} + \mathit{Bush(R)} +\mathit{Stem(R)} + \mathit{StudyArea}:\mathit{Year(R)} + \mathit{YearF:EcoDomain(R)}\]

where

  • \(log(BranchDiameter)\) is logarithm of the branch diameter on that bush and stem in that year.
  • \(\mathit{Year}\) term represents the average (over all EcoDomains) calendar year trend over time.
  • \(\mathit{EcoDomain}\) term represents a different intercept for each EcoDomain
  • \(\mathit{EcoDomain}:\mathit{Year}\) term represents the differential average slope for each EcoDomain
  • \(\mathit{StudyArea}:\mathit{Year(R)}\) term represents the random slopes within each EcoDomain for each study area
  • \(\mathit{Stem(R)}\) represents the (random) stem effect;
  • \(\mathit{Bush(R)}\) represents the (random) bush effect;
  • \(\mathit{YearF:EcoDomain(R)}\) represents the (random) year-specific effects (process error), thare are allowed to vary across EcoDomains.

The The \(\mathit{Stem(R)}\) and \(\mathit{Bush(R)}\) terms allows for the fact that bush and stem specific conditions may tend to affect the branch diameter consistently over time. The \(\mathit{YearF:EcoDomain}\) term represent the year-specific effects (process error) caused by environmental factors (e.g., a warmer than normal year may lead to larger branch diameters).

Model fit on the logarithmic scale assume that effects are multiplicative over time, so that the when the actual fit is done on the logarithmic scale, the trends are linear. For example, a trend may assume that there is constant 5% change over time rather than a fixed 1-unit change per year.

The model was fit using the lmerTest() function (Kuznetsova, 2017) in R (R Core Team, 2023).

Type III Analysis of Variance Table with Satterthwaite's method
                  Sum Sq   Mean Sq NumDF  DenDF F value Pr(>F)
Year            0.007614 0.0076137     1 8.6927  0.0570 0.8169
ECODOMAINF      0.030907 0.0154537     2 8.3902  0.1156 0.8922
Year:ECODOMAINF 0.034967 0.0174836     2 8.5302  0.1308 0.8791
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: log(Branch.Diameter..mm.) ~ Year + ECODOMAINF + Year:ECODOMAINF +      (1 | STUDY_AREA_NAMEF) + (Year - 1 | STUDY_AREA_NAME) + (1 |      BUSHF) + (1 | STEMF) + (1 | YearF:ECODOMAINF)
   Data: stemd.df

REML criterion at convergence: 1237.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.7633 -0.4212 -0.0145  0.4129 11.9217 

Random effects:
 Groups           Name        Variance Std.Dev.
 STEMF            (Intercept)  0.01706 0.1306  
 BUSHF            (Intercept)  0.03129 0.1769  
 YearF:ECODOMAINF (Intercept)  0.01826 0.1351  
 STUDY_AREA_NAME  Year         0.20597 0.4538  
 STUDY_AREA_NAMEF (Intercept) 63.92096 7.9951  
 Residual                      0.13367 0.3656  
Number of obs: 1041, groups:  STEMF, 296; BUSHF, 126; YearF:ECODOMAINF, 20; STUDY_AREA_NAME, 11; STUDY_AREA_NAMEF, 11

Fixed effects:
                           Estimate Std. Error        df t value Pr(>|t|)
(Intercept)                3.557824   3.051622  8.233804   1.166    0.276
Year                      -0.096638   0.173193  8.240467  -0.558    0.592
ECODOMAINFHumConHigh      -2.654857   5.565536  8.197576  -0.477    0.646
ECODOMAINFNorth           -0.279352   8.722323  8.580866  -0.032    0.975
Year:ECODOMAINFHumConHigh  0.158359   0.316163  8.233350   0.501    0.630
Year:ECODOMAINFNorth      -0.003138   0.498551  8.828143  -0.006    0.995

Correlation of Fixed Effects:
              (Intr) Year   ECODOMAINFH ECODOMAINFN Y:ECODOMAINFH
Year          -0.019                                             
ECODOMAINFH   -0.548  0.010                                      
ECODOMAINFN   -0.350  0.007  0.192                               
Y:ECODOMAINFH  0.010 -0.548 -0.018      -0.004                   
Y:ECODOMAINFN  0.007 -0.347 -0.004      -0.046       0.190       

3.2.4 Estimated slopes

The estimated coefficients from R output above do not have a direct interpretation because of the way that R codes the design matrix for discrete variables such as the EcoDomain.These coefficients associated with each EcoDomain are the difference in the slopes between the slope for that particular EcoDomain and the reference EcoDomain (typically the EcoDomain that is “first” alphabetically).

The estimated slope for each EcoDomain are estimated using the emmeans package (Lenth, 2023) presented in Table 6.

Table 6: Estimated slope in each EcoDomain

Eco Domain

Slope

SE

df

95% LCL

95% UCL

Grouping

North

-0.0998

0.4675

8.9

-1.1589

0.9594

1

CoastMtn

-0.0966

0.1732

8.2

-0.4940

0.3007

1

HumConHigh

0.0617

0.2645

8.2

-0.5453

0.6687

1

Degrees-of-freedom method: satterthwaite

Confidence level used: 0.95

Degrees-of-freedom method: satterthwaite

P value adjustment: tukey method for comparing a family of 3 estimates

significance level used: alpha = 0.05

NOTE: If two or more means share the same grouping symbol,
then we cannot show them to be different.
But we also did not show them to be the same.

The estimated trends (on the logarithmic scale) can be interpretted as the proportional change per year. For example, the slope of -0.097 in the CoastMtn EcoDomain can be interpretted as an approximate -9.7% change in the mean call rate per year.

The Grouping column indicates if there is evidence of a difference among the slopes across EcoDomains. EcoDomains that contain the same “digit” would indicate no evidence of a difference in the mean slopes among the EcoDomains. This is a summary of the overall p-value for parallelism found in the ANOVA table of p = 0.879. For more information on interpretting the .group variable, refer to https://schmidtpaul.github.io/DSFAIR/compactletterdisplay.html.

3.2.5 Mean slope over all EcoDomains

The average slope over all EcoDomains (giving each EcoDomain equal weight regardless of the number of study areas in the EcoDomain) shown in Table 7.

Table 7: Estimated mean slope over all EcoDomains

Mean slope

Slope

SE

df

95% LCL

95% UCL

p.value

overall

-0.0449

0.1881

8.7

-0.4728

0.3830

p = 0.817

Results are averaged over the levels of: ECODOMAINF

Degrees-of-freedom method: satterthwaite

Confidence level used: 0.95

The overall average slope over all EcoDomain is -0.045 (SE 0.188 with a p-value of p = 0.817. This is the same p-value from the ANOVA table for the Year term.

The estimated mean slope is interpreted in the same way as the slopes for each EcoDomain.

3.2.6 Summary plot

Figure 7 shows a summary plot, along with estimates of the slope, its standard error, and the p-value of the hypothesis of no trend in each EcoDomain.

Figure 7: Summary plot of the trend in mean branch diameter.

3.2.7 Estimated variance component

The estimated variance components are shown in Table 8.

Table 8: Estimated variance components

Component

Interaction

SD

STEM

0.131

BUSH

0.177

YearF:ECODOMAIN

0.135

STUDY_AREA_NAME

Year

0.454

STUDY_AREA_NAME

7.995

Residual

0.366

The STUDY_AREA_NAME and STUDY_AREA_NAME:Year represent the variation in the intercepts and slopes among study areas within an EcoDomain. The BUSH and STEM components represents the variation in intercepts among the bushes and stems within the same STUDY_AREA. The YearF:ECODOMAIN components represents the variation in the year specific factors (process error) within the EcoDomains. Finally the Residual component represents the left-over, unexplained variation in the data.

Notice in this case, that the variation in the slopes for each STUDY_AREA within the EcoDomains is very close to 0, i.e., the different study areas could all have similar slopes within the EcoDomain. If you examine the individual slopes in each study area (see the individual study area reports), the estimated standard error is quite large indicating the the slopes could be roughly the same across study areas (see Figure 8).

Figure 8: Comparing the individual slopes within each EcoRegion

The year specific effects vary slightly among the EcoDomains as shown in Figure 9.

Figure 9: Plot of year specific effects

We see that the year-specific effects can vary across EcoDomains, e.g., the year-specific impact of weather can be different in the different EcoDomains. This is not surprising because B.C. is a large Province!

3.2.8 Residual plots

Residual plots are presented in Figure 10.

Figure 10: Model fit diagnostic plots from trend in mean transect counts

In the upper left corner is a plot of residuals vs.  the fitted values. A good plot will show a random scatter around 0. Any large deviations from 0 should be investigated as potential outliers. In the upper right is a normal probability plot. Points should be close to the dashed reference line. Fortunately, the analysis is fairly robust against non-normality so only extreme departures are worrisome. Caterpiller plots attempt to show the distribution of the random effects. The bottom left plot shows the distribution of the transect effects. The bottom right plot shows the distribution of the year-specific effects (process variation). In this case, the estimated process variation is very small with most of points very close to 0.

3.3 Berry Count.

This measurement is taken at the stem level and so there is one value per stem/plant/year. The same stem/plant is repeated measured over time. All of the models below automatically will account for branches that are removed or added as long as each stem has a unique label within a site. The models for the berry count are similar to those from the stem diameter except that the counts may be somewhat smallish. The average count is about 5 or less, then a Poisson regression can, in theory, be used. However, if there are several random effects, then a Poisson mixed effects model is extremely difficult to fit. However, for larger counts, a linear mixed model on the log(counts+0.5) will work well and avoidd many of the problems in dealing with generalized linear mixed models.

The first few records are:

  STUDY_AREA_NAME_shrt ECODOMAIN Year   BUSH            STEM Berry.Count
1         Beatton Park     North 2015 plant2 plant2-branch34           2
2         Beatton Park     North 2016 plant2 plant2-branch35           4
3         Beatton Park     North 2014 plant2 plant2-branch35          18
4         Beatton Park     North 2016 plant4 plant4-branch39          54
5         Beatton Park     North 2014 plant2 plant2-branch34          24
6         Beatton Park     North 2016 plant5 plant5-branch41          46

Missing values are excluded.

Number of records prior to removing NA
[1] 1108  129


Number of records after to removing NA
[1] 1065  129

3.3.1 Preliminary plot

A summary plot of the mean berry count in each year for each STUDY_AREA and an the trends over time is shown in Figure 11.

Figure 11: Summary plot of the data. The data is analyzed on the logarithmic scale.

3.3.2 Model

A non-parallel slope model is fit allowing for a different average slope (over the multiple STUDY_AREAs) in each EcoDomains (non-parallel slopes). Within each EcoDomain, each STUDY_AREA’s slope is allowed to vary randomly around the average slope for the EcoDomain. Within each STUDY_AREA, each BUSH and STEM is allowed to have a different intercept but common slope. Finally, we allow for year-specific factor within each EcoDomain.

We add an offset of 0.5 to avoid taking logarithms of 0.

The model, in a short-hand notation is:

\[log(BerryCount+0.5) \sim \mathit{EcoDomain} + \mathit{Year} + \mathit{EcoDomain}:\mathit{Year}+ \mathit{StudyArea} + \mathit{Bush(R)} +\mathit{Stem(R)} + \mathit{StudyArea}:\mathit{Year(R)} + \mathit{YearF:EcoDomain(R)}\]

where

  • \(log(BerryCount+0.05)\) is logarithm of the berry count on that bush and stem in that year. An offset of 0.5 is added to avoid taking logarithms of 0.
  • \(\mathit{Year}\) term represents the average (over all EcoDomains) calendar year trend over time.
  • \(\mathit{EcoDomain}\) term represents a different intercept for each EcoDomain
  • \(\mathit{EcoDomain}:\mathit{Year}\) term represents the differential average slope for each EcoDomain
  • \(\mathit{StudyArea}:\mathit{Year(R)}\) term represents the random slopes within each EcoDomain for each study area
  • \(\mathit{Stem(R)}\) represents the (random) stem effect;
  • \(\mathit{Bush(R)}\) represents the (random) bush effect;
  • \(\mathit{YearF:EcoDomain(R)}\) represents the (random) year-specific effects (process error), thare are allowed to vary across EcoDomains.

The The \(\mathit{Stem(R)}\) and \(\mathit{Bush(R)}\) terms allows for the fact that bush and stem specific conditions may tend to affect the berry count consistently over time. The \(\mathit{YearF:EcoDomain}\) term represent the year-specific effects (process error) caused by environmental factors (e.g., a warmer than normal year may lead to more berries in all bushes and stems).

Model fit on the logarithmic scale assume that effects are multiplicative over time, so that the when the actual fit is done on the logarithmic scale, the trends are linear. For example, a trend may assume that there is constant 5% change over time rather than a fixed 1-unit change per year. Some caution is needed if any of the values are 0 as log(0) is not defined. In these cases, a small constant (typically ½ of the smallest positive value in the dataset) is added to all values before the analysis proceeds.

The model was fit using the lmerTest() function (Kuznetsova, 2017) in R (R Core Team, 2023).

Type III Analysis of Variance Table with Satterthwaite's method
                 Sum Sq Mean Sq NumDF  DenDF F value Pr(>F)
Year            0.01873 0.01873     1 13.566  0.0107 0.9191
ECODOMAINF      2.63346 1.31673     2 12.374  0.7533 0.4912
Year:ECODOMAINF 2.44830 1.22415     2 11.280  0.7003 0.5168
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: log(Berry.Count + 0.5) ~ Year + ECODOMAINF + Year:ECODOMAINF +      (1 | STUDY_AREA_NAMEF) + (Year - 1 | STUDY_AREA_NAME) + (1 |      BUSHF) + (1 | STEMF) + (1 | YearF:ECODOMAINF)
   Data: bcount.df

REML criterion at convergence: 3946.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.93625 -0.72864  0.03296  0.72011  2.84822 

Random effects:
 Groups           Name        Variance Std.Dev.
 STEMF            (Intercept)  0.3093  0.5561  
 BUSHF            (Intercept)  0.1744  0.4176  
 YearF:ECODOMAINF (Intercept)  1.3294  1.1530  
 STUDY_AREA_NAME  Year         0.1812  0.4256  
 STUDY_AREA_NAMEF (Intercept) 54.2329  7.3643  
 Residual                      1.7480  1.3221  
Number of obs: 1065, groups:  STEMF, 301; BUSHF, 126; YearF:ECODOMAINF, 20; STUDY_AREA_NAME, 11; STUDY_AREA_NAMEF, 11

Fixed effects:
                          Estimate Std. Error       df t value Pr(>|t|)
(Intercept)                3.25485    4.26309 12.97403   0.763    0.459
Year                      -0.08658    0.24334 11.62430  -0.356    0.728
ECODOMAINFHumConHigh      -7.59966    6.77161 10.91995  -1.122    0.286
ECODOMAINFNorth            4.44935   15.25021 14.25908   0.292    0.775
Year:ECODOMAINFHumConHigh  0.42759    0.39149  9.70779   1.092    0.301
Year:ECODOMAINFNorth      -0.27360    0.97463 13.47607  -0.281    0.783

Correlation of Fixed Effects:
              (Intr) Year   ECODOMAINFH ECODOMAINFN Y:ECODOMAINFH
Year          -0.563                                             
ECODOMAINFH   -0.630  0.355                                      
ECODOMAINFN   -0.280  0.158  0.176                               
Y:ECODOMAINFH  0.350 -0.622 -0.433      -0.098                   
Y:ECODOMAINFN  0.141 -0.250 -0.089      -0.755       0.155       

3.3.4 Estimated slopes

The estimated coefficients from R output above do not have a direct interpretation because of the way that R codes the design matrix for discrete variables such as the EcoDomain.These coefficients associated with each EcoDomain are the difference in the slopes between the slope for that particular EcoDomain and the reference EcoDomain (typically the EcoDomain that is “first” alphabetically).

The estimated slope for each EcoDomain are estimated using the emmeans package (Lenth, 2023) presented in Table 10.

Table 10: Estimated slope in each EcoDomain

Eco Domain

Slope

SE

df

95% LCL

95% UCL

Grouping

North

-0.3602

0.9438

13.5

-2.3919

1.6715

1

CoastMtn

-0.0866

0.2433

11.6

-0.6187

0.4455

1

HumConHigh

0.3410

0.3067

8.3

-0.3615

1.0435

1

Degrees-of-freedom method: satterthwaite

Confidence level used: 0.95

Note: contrasts are still on the 0.5(mu + 0.5) scale

Degrees-of-freedom method: satterthwaite

P value adjustment: tukey method for comparing a family of 3 estimates

significance level used: alpha = 0.05

NOTE: If two or more means share the same grouping symbol,
then we cannot show them to be different.
But we also did not show them to be the same.

The estimated trends (on the logarithmic scale) can be interpretted as the proportional change per year. For example, the slope of -0.087 in the CoastMtn EcoDomain can be interpretted as an approximate -8.7% change in the mean call rate per year.

The Grouping column indicates if there is evidence of a difference among the slopes across EcoDomains. EcoDomains that contain the same “digit” would indicate no evidence of a difference in the mean slopes among the EcoDomains. This is a summary of the overall p-value for parallelism found in the ANOVA table of p = 0.517. For more information on interpretting the .group variable, refer to https://schmidtpaul.github.io/DSFAIR/compactletterdisplay.html.

3.3.5 Mean slope over all EcoDomains

The average slope over all EcoDomains (giving each EcoDomain equal weight regardless of the number of study areas in the EcoDomain) shown in Table 11.

Table 11: Estimated mean slope over all EcoDomains

Mean slope

Slope

SE

df

95% LCL

95% UCL

p.value

overall

-0.0353

0.3406

13.6

-0.7679

0.6974

p = 0.919

Results are averaged over the levels of: ECODOMAINF

Degrees-of-freedom method: satterthwaite

Confidence level used: 0.95

The overall average slope over all EcoDomain is -0.035 (SE 0.341 with a p-value of p = 0.919. This is the same p-value from the ANOVA table for the Year term.

The estimated mean slope is interpreted in the same way as the slopes for each EcoDomain.

3.3.6 Summary plot

Figure 12 shows a summary plot, along with estimates of the slope, its standard error, and the p-value of the hypothesis of no trend in each EcoDomain.

Figure 12: Summary plot of the trend in mean branch diameter.

3.3.7 Estimated variance component

The estimated variance components are shown in Table 12.

Table 12: Estimated variance components

Component

Interaction

SD

STEM

0.556

BUSH

0.418

YearF:ECODOMAIN

1.153

STUDY_AREA_NAME

Year

0.426

STUDY_AREA_NAME

7.364

Residual

1.322

The STUDY_AREA_NAME and STUDY_AREA_NAME:Year represent the variation in the intercepts and slopes among study areas within an EcoDomain. The BUSH and STEM components represents the variation in intercepts among the bushes and stems within the same STUDY_AREA representing bush- and stem-specific factors that affect the berry count. The YearF:ECODOMAIN components represents the variation in the year specific factors (process error) within the EcoDomains. Finally the Residual component represents the left-over, unexplained variation in the data.

Notice in this case, that the variation in the slopes for each STUDY_AREA within the EcoDomains is very close to 0, i.e., the different study areas could all have similar slopes within the EcoDomain. If you examine the individual slopes in each study area (see the individual study area reports), the estimated standard error is quite large indicating the the slopes could be roughly the same across study areas (see Figure 13).

Figure 13: Comparing the individual slopes within each EcoRegion

The year specific effects vary slightly among the EcoDomains as shown in Figure 14.

Figure 14: Plot of year specific effects

We see that the year-specific effects can vary across EcoDomains, e.g., the year-specific impact of weather can be different in the different EcoDomains. This is not surprising because B.C. is a large Province!

3.3.8 Residual plots

Residual plots are presented in Figure 15.

Figure 15: Model fit diagnostic plots from trend in mean transect counts

In the upper left corner is a plot of residuals vs.  the fitted values. A good plot will show a random scatter around 0. Any large deviations from 0 should be investigated as potential outliers. In the upper right is a normal probability plot. Points should be close to the dashed reference line. Fortunately, the analysis is fairly robust against non-normality so only extreme departures are worrisome. Caterpiller plots attempt to show the distribution of the random effects. The bottom left plot shows the distribution of the transect effects. The bottom right plot shows the distribution of the year-specific effects (process variation). In this case, the estimated process variation is very small with most of points very close to 0.

4 Power/sample size analysis

There are two hypotheses of most interest:

  • What is the mean trend over all ecoregions. This is the Year effect test previously seen.
  • Are there differences in the trend among ecoregions? This is the Year:Ecoregion test previously seen.

Because the model has many random effects, the simr package (Green and McLeod 2016) will be used to estimate power at various sample sizes (number of years of monitoring). The simr package estimates the power by simulating many datasets with the same set of random effects, doing a model fit on each simulated dataset, and computing the proportion of the simulations where the p-value for the effect of interest is less than the \(\alpha=0.5\) level to estimate the power to detect this effect.

The berry protocol measures mean berry weight, number of berries, and mean stem diameter. The individual site analyses showed that berry counts are highly variable and so any power to detect trends will be small. Conversely, the stem diameter values have very little variation over time. Consequently, we will only do a power analysis on the mean berry weight response.

4.1 Power to detect mean trend.

We will first fit a simpler model that drops the Year:Ecoregion term in the model and fit a model with parallel slopes across the ecoregions (Figure 16).

Figure 16: Estimated power to detect mean trend

The lines may appear to be “wiggly” or decline when moving from left to right, but this is an aretefact of a small number of simulations.

The Year Effect is the proportional change/year that is of interest. For example, a Year Effect of .02, implies a 2% change/year.

A target power of .80 (when alpha=0.05) is recommended (horizontal line on plots). This may not be achieved for small year effects with less than 10 year of sampling.

5 Summary

This analysis examines trends across EcoDomains. The model has an overall average trend, a EcoDomain specific trend, and random trends for each Study Area among the EcoDomain trends. With many study areas and many years of data collected in each study area, the design has a high power to detect even a moderate trend over time.

6 References

Kuznetsova A, Brockhoff PB, Christensen RHB (2017). “lmerTest Package: Tests in Linear Mixed Effects Models.” Journal of Statistical Software, 82, 1-26. doi:10.18637/jss.v082.i13 https://doi.org/10.18637/jss.v082.i13.

Lenth R (2023). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.8.4-1, https://CRAN.R-project.org/package=emmeans.

R Core Team (2032). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.